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ABSTRACT 

We investigate the possible link between mergers and the enhanced activity of su- 
permassive black holes (SMBHs) at the centre of galaxies, by comparing the merger 
fraction of a local sample (0.003 ^ z < 0.03) of active galaxies - 59 active galactic nu- 
clei (AGN) host galaxies selected from the all-sky Swift BAT (Burst Alert Telescope) 
survey - with an appropriate control sample (247 sources extracted from the Hyper- 
leda catalogue) that has the same redshift distribution as the BAT sample. We detect 
the interacting systems in the two samples on the basis of non-parametric structural 
indexes of concentration (C), asymmetry (A), dumpiness (S), Gini coefficient (G) and 
second order momentum of light (M2o). In particular, we propose a new morphological 
criterion, based on a combination of all these indexes, that improves the identification 
of interacting systems. We also present a new software - PyCASSo (Python CAS 
Software) - for the automatic computation of the structural indexes. After correcting 
for the completeness and reliability of the method, we find that the fraction of interact- 
ing galaxies among the active population (20^ per cent) exceeds the merger fraction 
of the control sample (4j^'2 P er cent). Choosing a mass-matched control sample leads 
to equivalent results, although with slightly lower statistical significance. Our findings 
support the scenario in which mergers trigger the nuclear activity of supermassive 
black holes. 

Key words: galaxies: active - galaxies: interaction. 



1 INTRODUCTION 

Observations indicate that the growth history of su- 
permassive black holes (SMBHs, M B h > 10 6 M Q ) is 
closely connected to that of their host galaxies. The 
discovery of scaling relations, linking the black hole 
mass to properties of the host in the local Universe, 
hints f or a scenario of gal a xy-SMBH symbiotic evo- 
lution (iMagorrian et al] 1 19981: IFerrarese fc Merrittl l200d: 
Gebhardt et al.ll2000l: IMarconi fc HuntH2003l; iHaring fc Rlx 
2004 IFerrarese fc Fordl 120051 : iGrahaml l2012al ; iGraham 
2012bl ). In particular, the near ubiquity of SMBHs in massive 



IVolonteri fc Bellovarvl En! iMerloni fc Heinzl I2012T ). is fa- 
vored in galaxies where the importance of organized rota- 
tion both in the gaseous and stellar component is weak. As 
morphological properties of galaxies are likely to be deter- 
mined by their complex assembly history and can be tran- 
sient features, the processes that determine the formation 
and evolution of galaxies affect hand in hand the formation 
and evolution of SMBHs, and in particular their fueling. 

Theoretical models indicate that galaxy formation and 
evolution is driven by accretion o f gas from the cosmic 



spheroids indicates that black hole growth, mainly driven by 
gas accretion (e.g. IMarconi et al J 12004 ; ICroton et al.l 120061 : 
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environment (e.g. Keres et alJ 20051 ; Bournaud et al. 20051; 



Mapelli. Moore fc Bland-Hawthorn! 120081 : see ISancisi et al l 
2008 for a review) and by halo-halo interactions both involv- 
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I2001I for a review) . More recently, the mode of gas accretion 
has been recogni zed as playing a p otentially critical role in 
shaping galaxies |Sales et al.ll2012r ), leaving open the possi- 
bility that spheroids form via multiple episodes of misaligned 
gas inflows, besides major mergers. In lack of a broad con- 
sensus, observations of AGNs and of their galaxy hosts, from 
suitably selected samples, can provide clues on the mecha- 
nisms triggering the SMBH activity, and on their coevolu- 
tion. 

A longstanding issue is how the gas can lose enough 
angular momentum from the large scale (~ 0.1 — 100 kpc) 
down to the SMBH's horizon scale (~ 10~ J pc). A possible 
scenario involves gravitational perturbations due to tidal in- 
teractions between galaxies in close fiy-bys (on ~ 10 — 70 kpc 
scales) or/and violent galaxy mergers occurring on smaller 
scales of ~ kpc or less. These perturbations may drive large 
quant ities of gas towards the centre o f the merger remnant 
(e.g.. iKauffmann fc Haehneltl |2000| ; ISpringel et al. I 120051 ; 
Hopk ins et alj|2006h . This accumulated gas may induce both 
an intense starburst phase and an enhanced nuclear activity 
(active SMBH) , whose feedback, in turn, can act as a mech- 
anism to regulate subs e quent star-forma ti on and accretion 



llChurazov et al. 200 ll; iBest et aL 20061; Schawinski et all 



l2006l ; ISchawinski et al.M2007l ; iMcNamara fc Nulsenl 120071 ) 
Galaxy interactions/mergers should be therefore responsi- 
ble not only for large scale ( XL 10 3 pc) morphological dis- 
tortions, but also for the inflow of gas down to the typical 
scale of SMBH accretion ( < 10" 4 pc). 

If SMBH activity is triggered, at least partially, by 
galaxy mergers, the fraction of galaxies with clear sign of be- 
ing the results of interactions/mergers should be statistically 
higher in a sample of AGN-host galaxies than in a sample 
of field galaxies. This and other similar observational tests 
have been carried out in th e last few years with somehow 
contrasting r e sults (see e.g. Petrosiar] 19821; Daharil 1984 
Daharil [19851; iKeel et all Il985l; iFuentes- Williams fc Stocke 
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Koulouridis et all 120061 ; ISerber et all l2006h . In 

particular, while some studies claim a connection 
between nuclear activity and the pres ence o f close 
companion s or ti dal distortions (e.g ., iDahar 
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The differences between various studies might be due 
to biases in the choice of the galaxy sample. For example, 
obscured AGNs can be missed in studies based on optical 
emission-line ratios, optical spectral classification or even 
soft X-ray fluxes . Among the aforementioned studies, only 
IKoss et alj i|2010l ) use a sample of hard X-ray selected AGNs, 
and find a strong excess of merging systems with respect to 
a control sample. 

Another source of error is counting chance superposition 
galaxy pairs as physically interacting galaxies (for more de- 



tails about this source of error we refer the reader to section 
6.1 of lEllison et~afll201ll ). 

The third source of bias is the possible time de- 
lay between the merger and the sw itch on of the nu- 
clear activity. Variou s studies (e.g. Ellison et ah! 120081 ; 
ISchawinski et aL I 120091 ; ISchawinski et alj l2QloT and refer- 
ences therein) find empirical evidences that mergers en- 
hance star formation first, and only at later epochs trigger 
the AGN phase (~ 500 Myr a f ter th e starburst). In fact 
ISmirnova. Moiseev fc Afanasievl (|2010T ) analyse a sample of 
apparently isolated Seyfert galaxies and find that about 35 
per cent of them show tidal tails, consistent with a gas- 
rich merger (likely a minor merger) in the last 0.5 — 1 Gyr. 
Thus, samples of galaxy pairs might miss, by default, late 
merger phases and gas-rich minor mergers. This problem is 
less acute when empirical measures of galaxy morphology 
are used, as they can identify a galaxy as the result of an 
interaction/merger even when it lacks a companion (pro- 
vided that interaction features are strong enough). There- 
fore, these measures are sensitive both to the initial and the 
late stages of mergers, and are less biased against specific 
merger phases. 

In this paper we re-address the possible link between 
mergers/interactions (in the following, we will use the two 
terms as synonimous) and SMBH activity, by comparing the 
merger fraction of an AGN host galaxy sample to the typical 
merger fraction of galaxies in the local Universe. 

To satisfy the need that both the galaxy sample and the 
method of analysis are as unbiased as possible, (i) we use a 
hard (> 10 keV) X-ray selected AGN sample (not to miss 
obscured sources, with the partial exception of the heavily 
absorbed Compton thick AGNs, i.e. those sources with ab- 
sorbing column densities exceeding 10 24 cm" 2 ), and (ii) we 
adopt a non-parametric morphological analysis (to identify 
truly interacting galaxies even in late merger phases). 

Moreover, we propose an improved technique for eval- 
uating the merger fraction of a galaxy sample by using a 
method that is objective, reliable and fast, so that it can 
be applied, in the future, to larger samples of galaxies; we 
also define the completeness and the reliability coefficients, 
that allow a statistical correction of the merger fraction and 
further reduce possible residual errors in the automated clas- 
sification. 

This paper is organized as follows: Section [2] presents 
the galaxy samples and the procedure adopted for their un- 
biased selection; Section [3] explains the non-parametric mor- 
phological method used for the analysis; Section U presents 
our estimates of the merger fraction of the AGN BAT sam- 
ple and of the control sample; Section [5] outlines a sum- 
mary of the most important points. Appendices [X] and [B] 
present respectively the data processing algorithms (includ- 
ing a detailed description of the software that we developed 
for our automated classification) and a discussion on the 
image degradation effects that affect data analysis. 



2 SAMPLE SELECTION 

The aim of this work is to study the possible link between 
mergers and SMBH activity, by comparing the merger frac- 
tion of an AGN host galaxy sample to the typical merger 
fraction of galaxies in the local Universe. To this purpose, 
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we select two samples: the first one is a hard (15-195 keV) X- 
ray selected sample of active galaxies (will be addressed here 
as the BAT sample) , which is similar - wit h several ob j ects i n 
common - to the sample already used in iKoss et all (|2010h . 
The second one is an optically selected control sample of 
galaxies (without any imposition on their active nature) 
that we extract from the Hyperleda catalogue |Paturel et al.l 
2003). We impose on both samples a minimum redshift of 
0.003, to avoid too extended sources (image processing faces 
some difficulties in these cases) and a maximum redshift of 
0.03, because the optical counterparts of the selected galax- 
ies need to match the requirements for our morphological 
analysis (see Appendix I A3|) . 

2.1 BAT sample 

The Burst Alert Telescope (BAT) is a coded aperture imag - 
ing camera on-board the Swift satellite dGehrels et al.ll2004) ; 
it has a wide field of view (1.4 steradian), a PSF of 17 ar- 
cmin (FWHM) and it operates in the 15-195 keV energy 
range. To select a sample of AGNs out of Swift BAT obser- 
vations, we_ad£rj^_the_^ZerTOO Swift-BAT Hard X-ray cat- 
alogue l|Cusumano et ah! 120101 ) ■ that collects the data rel- 
ative to the first 54 months of the Swift mission and is 
therefore one of the most complete, well defined and ex- 
tended catalogues of hard X-ray sources up to date. It con- 
tains 1256 sources with a signal to noise ratio greater than 
4.8, a flux limit of 6.0 x 10 -12 erg cm -2 s _1 and a coun- 
terpart identification with a 95 per cent confidence level. 
This catalogue represents a relatively unbiased sample of 
AGNs, because it is based on a particular hard X-ray band, 
where biases against absorbed AGNs are less important. 
For our analysis, we extract from this catalogue a com- 
plete sample of 523 sources, with absolute Galactic latitude 
]6| > 15°, S/N > 5 and flux greater than 8.0 x 10~ 12 erg 
cm~ 2 s _1 . Second, we select a complete sub-sample in the 
redshift interval 0.003 ^ z < 0.03 and, finally, we restrict 
to the area of sky covered by the Sloan Digital Sky Survey 
Data Release 8 ( [http://www .sdss3.org/dr8/ ), to make use 
of the optical data offered by this survey. The final BAT 
active galaxy sampleQ consists of 59 sources (15 at redshift 
0.003 < z < 0.01, 16 at redshift 0.01 < z < 0.02 and 28 at 
redshift 0.02 ^ z < 0.03), which represent ~ 35 per cent of 
the total number of galaxies belonging to the complete sam- 
ple in the same redshift interval 0.003 — 0.03 (169 objects). 
The BAT sample is not a mere selection of galaxies, but of 
systems instead: the sources are selected on the basis of the 
presence of one AGN at least, but the poor angular reso- 
lution of Swift BAT observations does not allow to distin- 
guish the possible X-ray emission of multiple AGNs in pairs 
or group of galaxies. As a consequence, in case of merging 
galaxies, the ensemble of objects is considered as a single (in- 
teracting) system, likewise each isolated galaxy represents a 
single (but non interacting) system. In particular, the "in- 
teracting" or "non interacting" classification is determined 
from the results of the automated structural analysis (see 
Section 13. 2p . 

1 This sample does not include two sources that have too low 
resolution for being analysed and one source that is very close to 
a bright star, which invalidates our analysis. 



2.2 The control sample 

The control sample is used to evaluate the average merger 
fraction among galaxies and to compare it with the same 
value found in the BAT sample (i.e. among AGNs), so it 
has to match the redshift distribution of the BAT sample 
and it must be unbiased towards interacting or isolated sys- 
tems. 

For example, a random sampling among SDSS galaxies 
would lead to an overestimate of the merger fraction, be- 
cause interacting systems have more chances to be selected 
than isolated galaxies (in fact they can be sorted out by each 
one of their members). Therefore, we replicate the particu- 
lar "system classification" of the BAT sample also in the 
control one. In the following we describe the procedure used 
to define the control sample: 

- We select three random square boxes of sky fully cov- 
ered by SDSS imaging. All boxes have a side of 7.5 de- 
grees and contain, on average, ~ 300 galaxiefl in the 
0.003 ^ z < 0.03 redshift interval. The choice of multiple 
medium-size boxes, instead of a single large box, avoids bi- 
ases related to local peculiar environment (i.e. galaxy groups 
or clusters). The size of the boxes ensures a significant num- 
ber of sources inside each one, so that possible border effects 
become unimportant (i.e. the loss of one galaxy of a pair that 
lies halfway the edge of the box). 

- We consider all the sources in the Hyperleda catalogue 
present in the three boxes of sky quoted above. For each 
galaxy, we acquire the SDSS image and, on the basis of the 
structural parameters (asymmetry, dumpiness, Gini coeffi- 
cient, second order momentum of light - see Section [3]), we 
distinguish whether it is interacting or isolated. 

- We switch from the "galaxy classification" to the "sys- 
tem classification": we consider as a multiple system every 
ensembl«0 of sources in which at least one galaxy has been 
identified as interacting through our classification. We con- 
sider only one galaxy of each system, so that each system is 
represented only by one entry. At this point, the control sam- 
ple consists of 734 systems (79 at redshift 0.003 < z < 0.01, 
67 at redshift 0.01 < z < 0.02 and 588 at redshift 0.02 < 
2 < 0.03). 

- The redshift distribution of these sources is consider- 
ably different from the BAT sample, so possible redshift- 
related effects (i.e. an evolution of the merger fraction) may 
alter significantly our comparison. For this reason, we reduce 
our sample by randomly extracting, in each redshift bin, the 
right number of sources to match the BAT sample's redshift 
distribution. 

At the end of this procedure we obtain a redshift- 
matched control sample of 247 sources, distributed as shown 
in Table [2] that are fully comparable with the BAT ones. 



2 We point out that, due to our subselections and the impossibil- 
ity to analyse all the images, the number of valid objects in each 
box is usually reduced almost by 20 per cent. 

3 The extent of the ensemble depends on the number and the 
kind of sources falling into the aperture (automatically computed 
on the basis of the light profile of the central galaxy, see Appendix 
IA3t for the estimation of structural indexes. However, in general, 
it is unlikely that galaxies with separation greater than 30 kpc 
are included in the same aperture. 
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We point out that the control sample contains both active 
and quiescent galaxies at random, because we want to check 
whether the merger fraction of the BAT AGN sample is sig- 
nificantly higher than the typical merger fraction in the local 
Universe. 



3 DATA ANALYSIS 

In this work we aim to determine the merger fraction of two 
samples of galaxies using a method that is objective, reliable 
and fast, so that it can be applied, in the future, to larger 
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method that has been proven to be clearly superior to the 
others. 

Pair counts require a strong observational effort, because 
they need redshift measurements for each galaxy, to avoid 
chance superpositions. Moreover, even pairs of galaxies at 
the same redshift could be not gravitationally bound, lead- 
ing to an overestimate of the merger fraction. 
Other techniques rely on the identification of galaxies that, 
due to gravitational interactions with a close companion, 
show morphological perturbations. The visual, qualitative, 
classification is the most used and accurate method, but it 
is intrinsically subjective and becomes less and less reliable 
with increasing redshift, because of the lower resolution and 
S/N ratio. Moreover, it is time consuming, and, therefore, 
it is not applicable on very large samples of galaxies. 
Quantitative classifications are less accurate but more ob- 
jective, and allow corrections for high redshifts, because the 
image degradation is measurable. Among these, we can dis- 
tinguish between parametric and non parametric classifica- 
tions. In the first kind, the projected light distribution of the 
galaxy is either fitted as a whole with an analytical model 
(like the Sersic or the de Vaucouleurs profile), or it is split 
in its various components (i.e. a bulge and a disk), that are 
fitted separately. Nevertheless, these methods are quite un- 
suitable for irregular or disturbed galaxies and, in case of 
close pairs, the subtraction of the extra light coming from 
the companion is not trivial. Non parametric techniques are 
not based on any analytical models, so they are equally ap- 
plicable on every kind of galaxy; however, it is more difficult 
to convert their values into physically meaningful results. An 
interesting, non pa rametric classific ation has been develope d 
in recent years by IConselicd (|2003h and lLotz et all (2004): 
it consists in a set of five structural indexes that measure 
specific properties of a galaxy. The first three parameters, 
concentra tion (C), asymm etry (A) and dumpiness (S), pre- 
sented bv lConselicell|2003h , are referr ed as the CAS sys tem; 
the other two indexes, introduced bv lLotz et al l l|2004h . are 
the Gini coefficient (G) and the second order momentum 
of light (M20). We decide to adopt this non-parametric ap- 
proach for our analysis and we will refer to the whole set 
of indexes as CASGM system. As in a visual analysis, the 
CASGM method becomes less reliable in case of low reso- 
lution or S/N ratio, but t hese effects have been well quan- 
tified by lLotz et all |20o3) and are reported at the end of 
in Appendix IA3I Taking into accounts this limits, we have 



imposed a maximum redshift of 0.03 to our samples, so that 
SDSS images ensure the minimum requirements for the au- 
tomated analysis. 

3.1 The CASGM parameters 

In order to compute these parameters, we need first to de- 
termine the extension of the galaxy, which is based on the 
Petrosian radius. The Petrosian index of a galaxy is the ra- 
tio between the mean surface brightness inside radius R, 
p.(r < R), and the surface brightness fi(R) at R, that is: 



tj(R) = 



fi(r < R) 
M(-R) 



(1) 



The Petrosian radius is the radiu s rp at which th e inverted 
Petrosian index is equal to 0.2 (|Petrosianl Il976l ). For the 
CAS system, the area of the galaxy is the circular area inside 
1.5 times the Petrosian radius at r(j] = 0.2), with centre in 
the point that minimizes the asymmetry of the galaxy. 

• Concentration: the concentration index is the ratio of 
the light inside an inner aperture (circular or elliptical) to 
the light inside an outer a perture. The CAS system adopts 
the lBershadv et al l (|200Gn definition, so C is defined as: 



C 



lo 9 (^ 
\r20 



(2) 



where rso and T2o are the radii that contain the 80 per 
cent and the 20 per cent of the total light of the galaxy, 
respectively. Typical values of C range from ~ 2 to ~ 5: 
elliptical galaxies and spheroidal systems usually have 
C > 4, disk galaxies have 3 < C < 4, while galaxies with 
a low surface brightness or a low velocity dispersion have 
C ~ 2. 

• Asymmetry: the A coefficient measures the asymme- 
try degree of the galaxy light distribution under a 180° ro- 
tation. This inde x was originally used to describe galaxy 
morphologies (i.e lAbraham et ah[ 19961). but we follow the 
slightly modified formulation of Iconselicd l|2000bl ). This in- 
dex is computed by subtracting the 180° rotated image to 
the original one, and by normalizing the residuals by the 
total flux of the galaxy. This value is then corrected by sub- 
tracting the asymmetry contribution of the background (i.e. 
produced by a luminosity gradient or a close stellar halo), 
which is computed in the same way. Therefore, the final 
value of A is 



A= Ei.J^.j 



hm{i,j)\ 



y 



I(i,3)\ 
B(i,j) - B 1S0 (i,j 



■ + 



(3) 



E-I^i) 



where / and B are respectively the original image of the 
galaxy and of the background, while Ziso and Biso are 
their rotated images. This coefficient is sensitive to all the 
processes that introduce a certain degree of asymmetry in 
the light distribution, such as star forming regions, dust 
bands and mergers. The relativ e contribu ti on of these 
elements have been studied by IConselicel (|2003h . who 
showed that small scale structures can make up only to the 
30 per cent of the asymmetry of the galaxy; therefore A 
is dominated by large scale effects and is a good tracer of 
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mergers and gravitational distortions. 



it can be expressed as 



• Clumpine ss: the S index has been introduced by 
IConselicel (|2003h to quantify the patchiness of the galaxy, 
that is the fraction of light coming from small scale struc- 
tures, such as clumps of star formation. It is defined as the 
ratio of the flux contained in high frequency features to the 
total flux of the galaxy. It is computed by subtracting a 
blurrecQ copy of the image to the original one, and then 
normalizing by the total flux of the galaxy. The value is 
then corrected by removing the background clumpiness, so 
it is equal to 



G 



5= 10 



J2 itj (B(i,j)-B s (iJ)) 



-+ 



(4) 



where I and B are the original image of the galaxy and of the 
background, respectively, while 7s and Bs are their blurred 
images. The nuclear 0.25rp region is excluded from the com- 
putation, because it would give a high clumpiness contribu- 
tion, which is not related to a region of young and intense 
star formation. Moreover, negative values after th e subtrac- 
tion of the smoothed image are forced to zero ()Conselicel 
I2003T ). 

Large values of S indicate that most of the light of the galaxy 
is accumulated in few and clumpy structures (i.e. starburst 
galaxies), while low values of 5* indicate that the light dis- 
tribution is smooth (i.e. elliptical galaxies). 

G and M20 ar e based on the se gmentation map of the 
galaxy defined by lLotz et ail l|2004 ). In contrast with the 
circular and the elliptical apertures of the CAS indexes, the 
segmentation map can assume any irregular shape, because 
its constraints (see Appendix IA3.7|I are only a brightness 
limit (to exclude the background and possible spurious pix- 
els) and a continuity requirement (any source that is not di- 
rectly connected with the galaxy is not taken into account). 
Therefore, the segmentation map can follow accurately the 
outline of the galaxy, especially in case of close couples and 
mergers. 

• Gini coefficient: the Gini coefficient is a measure of 
statistical dispersion. It is usually adopted in economics to 
describe the inequali ty of a distribution (i.e. levels of income) 
and w as adapted by lAbraham et all (|2003f ) and lLotz et al.l 
l|2004h for the morphological classification of galaxies. The 
formulation of the Gini coefficient is based on the Lorentz 



L(p) = jJ V F- 1 (u)du, 



(5) 



where p is the percentage of the faintest pixels, F(x) the 
cumulative distribution function and X the average value of 
all th e Xi intensities. After some rearrangements (|Glasserl 
Il962h and a correction to compensate f or the Poissonia n 
noise in the faintest regions of the galaxy |Lotz et al.ll20o4 ) , 



4 The blurring is obtained by convolving the original image with 
a filter of width a = 0.25rp. 



1 

— -^(ai-n-i)!* 

n — 1) 



\X\n(n-l) 



(6) 



The Gini coefficient is computed on the segmentation map 
and represents a sort of generalized concentration index, in 
fact it tells whether the light is evenly distributed inside the 
galaxy, but does not depend on any particular centre. This 
index can range from zero, in case of a perfectly uniform 
distribution, to one, in case that all the light of the object 
is concentrated in a single pixel. 

• Momentum of light: the M20 coefficient measures 
how far from centre are located the brightest pixels of the 
galaxy. It is based on the total second order momentum 
of light Mt t, that is the sum, over all the pixels of the 
segmentation map, of the pixels' flux fi multiplied for its 
square distance from the centre: 



Afto 



The x c and y c variables are the coordinate of the galaxy 
centre, which is now defined as the pixel that minimizes 
the value of M to t- The M20 coefficient is the second order 
momentum of the brightest 20 per cent of the gal a xy. T o 
compute it, we follow the procedure in lLotz et al.l l|2004h : 
the pixels of the segmentation map are sorted by decreasing 
flux; then the corresponding momenta M; are summed, until 
the sum of the brightest pixels equals 20 per cent of the total 
galaxy flux; finally this value is normalized by M to t, so 

M 20 = logio ( while V fi < 0.2/tot • (8) 

\ IVltot I 

The normalization removes dependencies on the size of the 
object and its total flux, making M20 less subject to incli- 
nation effects. Being weighted on the square of the distance 
from the centre, this index is especially suitable for detecting 
double nuclei systems (such as close galaxies in a merging 
phase), because the brightest pixels of the system are off- 
centre and they give a large contribution to the value of 
M 20 . 

The CASGM system relies on the Petrosian radius, 
that, being based on a curve of growth, is independent of 
the galaxy size and largely insensitive to both t he S/N ra- 
tio an d the surface brightness of the sources (see lLotz et al.l 
(2004) for a discussion about the influence of low S/N and 
resolution on these parameters). 

These indexes are related to galaxy morphologies, and the 
authors of the CASG M system have cal ibrated a complete 
classification using the iFrei et ail l| 19961 ) catalogue. More- 
over, linking couples of CASGM indexes, they defined some 
fiducial sequences, that allow the separation of norma{3 and 
merging galaxies. In our work we use the twc[f| main merger 
criteria: 



5 Throughout the rest of this text, we will address to non merging 
galaxies as normal. 

6 We tried also the relation based on the asymmetry and the Gini 
coefficient llLotz et al.ll2004) . but it gave a worse subdivision of 
the merging systems. Therefore, we rejected this relation. 
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• A — S criterion (Consclice 200jl): in the plane A vs. S 
normal galaxies show a good correlation 

A m (R) = (0.35 ± 0.03) x S(R) + (0.02 ± 0.01) . (9) 

The two indexes are computed on i?-band images, because 
they are less sensible to bright young stars and provide a 
more stable relation. Mergers should deviate from this rela- 
tion because their light distribution, distorted from gravita- 
tional interactions, raises significantly the value of A, while 
it has a weaker influence on the S parameter. Therefore, 
galaxies that show a large deviation from the fiducial se- 
quence, or simply a very high value of asymmetry, that is 



A > A&t + 3cr or A > 0.35 



(10) 



are classified as mergers (a is the mean dispersion in equa- 
tion and is equal to .035). 

• G — M20 criterion l|Lotz et al 1120041 ) : as in the previous 
case, the correlation among normal galaxies in the plane G 
vs. M20 is used to define this merging criterion: 



G > -0.115 x M 20 + 0.384. 



(11) 



3.2 Data processing 



Our data processing workflow is organized into three main 
steps, each one coupled with a specific software. 

- Data acquisition: SDSS frames cover a field of view 
of ~ 14 x 10 arcmin. Because our galaxies are near and ex- 
tended, they are often close to the edge of the image, or 
they fall halfway along multiple frames. We use the software 
M ONTAGE0 to assemble multiple images in fits format (de- 
tails about this step are given in Appendix I Al|) . 

The Hyperleda database is an ideal starting catalogue for 
this operation since it provides, for each galaxy, the list of 
properties (coordinates, diameter, position angle, redshift, 
etc.) to automatically run Montage. Because we are still 
dealing with a moderate number of sources, we carefully 
checked the correct assembly of all the images. 

- Pre-processing: in this step we prepare the image for 
the computation of the structural indexes: every feature that 
might affect the CASGM analysis (i.e. bright stars in fore- 
ground, cosmic rays, image artifacts, etc.) must be masked. 
For our a utomated workflow, we used the software SEx- 
tractor (Bcrt in fc Arnoutsl [l996'). that provides a fast de- 
tection of all the sources in the image. Source identification 
is essentially based on local intensity and contrast, but the 
software examines also the light profile, extracting a number 
of properties (for a detailed description see Appendix I A2|) . 
Therefore, at the end of the pre-processing stepjf], the origi- 
nal image is associated to a SExtractor catalogue, and to 
several "service" images, that specify the regions to exclude 
and provide useful information for the CASGM analysis. 



7 Developed by the NASA Earth Science Technology Office; 
http://montage.ipac.caltech.edu/ 



After the pre-processing phase, about 12 per cent of the 
sources is discarded, usually because the Hyperleda coordinates 
are wrong, or MONTAGE can not produce the mosaic or it is im- 
possible to setup the image properly (i.e. because the galaxy is 
too faint and is not fitted correctly by SExtractor). 



- CASGM analysis: the crucial part of this work is en- 
trusted to our software PyCASSo (Python CAS Software), 
whose task is to compute the CASGM structural indexes. 
PyCASSo is entirely developed in Python, an high level 
and object oriented programming language, with exten- 
sive standard library and the possibility to import mod- 
ule£j| suited for handling scientific data and astronomical 
images. We give a detailed description of the algorithms im- 
plemented in PyCASSo in Appendix[ 



W e tested our workflow and softwares on the lFrei et al.l 
(1996) catalogue. This catalogue collects a sample of nearby, 
well-resolved galaxies, and it is therefore suitable for testing 
the reliability of the algorithm, possible side effects (see 
Appendix [B] for an image degradation discussion) and im- 
provements in the implementation of the CASG M indexes. 
We compared our result s on these galaxies with Conselicel 
|2003l ). lLotz et all (|2004f ) and IVikram et al.l l^OloT T and we 
found a very good agreement: on average, the C, A, S and 
M20 coefficients are consistent within la with the results of 
the other authors, while the Gini coefficient is in agreement 
within 1.5cr. 

To further test the CASGM method, we carried out a 
visual classification on all the systems identified as merger 
by the CASGM analysis, both in the BAT and in the control 
samples. The visual classification assigns each galaxy to one 
of these three classes: (i) "normal" galaxies do not show any 
signs of interaction (i.e. appear regular and isolated); (ii) 
"edge-on" galaxies: these are intentionally kept separated 
from non edge-on galaxies to study possible bias es related to 
dust bands, as highlighted by other studies fi.e. lJogee et al.l 
120091 ; |Pe Propris et all 120071 '); (iii) "merger" systems, i.e. 
close pairs of galaxies and sources showing morphological 
distortions or perturbations (such as tidal tails, double 
nuclei, etc.). The visual classification is based first on the 
RGB and fits images available in the SDSS database and 
on the corresponding fits images. Where available, we 
exploited also the spectroscopic data, to discern projected 
pairs of galaxies from real ones. Finally, for the most critical 
objects, we searched for further information in NEE0 and 
in the literature. 



4 RESULTS 

Here we present the results of our automated classification 
and the merger fraction of the two samples. As explained in 
section 12.21 the BAT sample is a collection of systems, so 
we have to switch from galaxy to system classification also 
in the control sample, to make them fully comparable. To 
this purpose we consider as a single system any ensemble of 
galaxies for which one galaxy, at least, has been classified as 
interacting. The interacting or non-interacting classification 
is of course provided by the specific merger criterion consid- 
ered. 



9 In particular, PyCASSo needs Numpy scientific module, essen- 
tial for matrix operations; PyFITS, used to read images in fits 
format; and matplotlib, used to create control images and plots. 

10 NASA/IPAC Extragalactic Database, 
http: / /ned. ipa c.caltech.edu/ 1 



© 2012 RAS, MNRAS 000,[THT7] 



Non-parametric classification of a BAT sample 7 



We ran PyCASSo using both elliptical and circular aper- 
tures and we visually checked the control images and the 
results produced by our software. In most of the cases the 
two analyses coincide, but for some class of objects (i.e. edge- 
on galaxies and mergers) the elliptical apertures prove to be 
more reliable, being able to better fit the outline of these 
sources. In case of stretched objects, instead, circular aper- 
tures include a large amount of background, so the correc- 
tions applied to the asymmetry and the dumpiness become 
more critical. For these reasons, we report only the results 
of the ellip t ical c lassification. 

lLotz et all (I2004T ) studied the typical errors assoc iated with 
the C ASGM measurements by analyzing the iFrei et all 
|l996T ) images and the SDSS images of the same galaxy sam- 
ple. These differences provide an average estimate of the un- 
certainties on the indices, in fact: (i) they take implicitly into 
account the s light smoothing e ffect introduced by Montage 
(because the IFrei et aU (1 19961 ) galaxies always belong to a 
single frame) ; (ii) they take into account t he differen c es due 
to image resolution and quality (because IFrei et all (Il996t ) 
have a lower resolution, so they are similar to SDSS im- 
ages at larger redshifts). The uncertainties related to the 
structural indexes are the following: SC — 0.11, SA = 0.04, 
SS = 0.09, SG = 0.02, 5M 20 = 0.12. 



4.1 Results of the BAT sample 

The results obtained on the BAT sample, using both the 
visual and the automated classifications, are reported in 
Table [T] and discussed in the following (errors on the merger 
fractions are of 68 per cent c onfid ence level and have been 
computed using the Gchrcls (1986) prescriptions): 

- Visual classification: through the visual classification we 
estimate a merger fraction of 20ig per cent (we identify 12 
mergers, 9 edge-on galaxies and 38 normal systems). 

- A — S classification: the criterion based on the asym- 
metry and the dumpiness (equation 1 10|l detects 20 mergers, 
giving a merger fraction of 34 ± 7 per cent. Eleven of the 
12 systems visually classified as merger have the same clas- 
sification with the A — S method (see Table [T] and Figure 
[1] upper panel). The higher fraction of mergers detected 
with the A — S method is due to a moderate contamination 
of normal systems with low dumpiness. In fact, for these 
cases, even a small asymmetry contribution, produced by 
small spuriou£3 sources within the CAS aperture, may be 
enough for labelling that galaxy as interacting. 

- G — A/20 classification: the criterion based on the Gini 
coefficient and the momentum of light (equation [TTJ identi- 
fies 18 mergers, giving a merger fraction of 31 ± 7 per cent. 
In this case, the higher fraction of merging systems with 
respect to the visual classification is due to the contam- 
ination produced by edge-on galaxies. These galaxies are 
observed through dust bands that obscure the central part 



11 For example, in some cases S EXTRACTOR is not able to sepa- 
rate the faint high redshift galaxies in the background from the 
main one, and the same occurs for small stars in the foreground. 
If the dumpiness value is near zero, the asymmetry contribution 
coming from these sources may determine their misclassification 
as mergers. 
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Figure 1. Comparison between visual and automated classifica- 
tions. Red circles: galaxies visually classified as interacting; green 
squares: edge-on galaxies; blue asterisks: normal galaxies. Struc- 
tural indexes classify as merger the galaxies lying above the dot- 
ted lines. The error bars are average diff erences between SDSS 
and Frei observations of the same objects IjLotz et al.ll2004r > . The 
A — S criterion shows a slight contamination produced by normal 
galaxies, while the G — M20 is biased towards edge-on galaxies. 



of the source and leave two bright areas symmetrically off- 
centred that influence the momentum of light. A similar ef- 
fect occurs also for pronounced barred galaxies. Ten of the 
12 systems visually classified as mergers have the same clas- 
sification also through the G — M20 method (see Table[T]and 
Figure [1] lower panel). 



4-1.1 Improvement of the CASGM system 

As shown in the previous section, the automated classifica- 
tions correctly identify almost all the interacting systems, 
but they systematically overestimate the real number of 
mergers. For this reason, we introduce an advanced criterion 
that blends togethei0 the previous procedures: we consider 
as mergers only those systems that satisfy simultaneously 
the A — S and the G — M20 criteria. All these indexes have 
similar resolution and S/N requirements and so they can be 



12 Some attempts have been made by lLotz et al.l (I2004T) . that 
studied other combinations of CAS and GM indexes (i.e. G — A, 
G — S, A — M20) which, however, did not produce better classi- 
fications than the A — S and G — M20 criteria. 
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PBCJ Visual Automated classification 

name analysis A-S G-M20 Combined crit. 
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0255.2-0011 


Al 
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0742.4+4498 
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X 




1345.4+4141 


e 
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M 


X 


X 


X 


2318.9+0014 


M 


X 


X 
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Total mergers 


12 


20 


18 


12 



Table 1. Merger identifications in the BAT sample: we report 
only those galaxies that have been tagged as interacting by at 
least one classification method. In the visual classification, "M" 
identifies mergers, "n" the non interacting galaxies and "e" the 
edge-on galaxies. The mergers of the automated criteria are la- 
beled by a "x" mark. It is possible to notice that the combined 
criterion is much more reliable than the others, in fact it removes 
most of the contaminations and it provides results in good agree- 
ment with our visual analysis. 



used together; however, this choice may limit the effective- 
ness of the merger identification, because each method is not 
sensible to the entire duration of the merger and the inter- 
acti on phases mapped by each criter i on do not fully overlap 
fsee lLotz et~al] l200Sl and IConselicel l200d ). We expect the 
combined criterion to be much more reliable than the orig- 
inal ones. For instance, the G — M20 contamination should 
be largely removed because edge-on and barred galaxies are 
basically symmetric and, therefore, they should be excluded 
by adding the A — S classification. 



4-1.2 Merger fraction of the BAT sample 

The combined criterion proves to be an optimal solution, in 
fact it does not miss almost any merger compared to the 
previous criteria and it removes about 77 per cent of their 
wrong classifications, leading to a merger identification in 
excellent agreement with our visual analysis (see Table [1] 
and Figure [2]). By exploiting the combined criterion, we de- 
tect 12 disturbed systems, so the merger fraction of the BAT 
sample is 20^5 per cent. 



O 
O 




Figure 2. Comparison between our combined criterion and the 
visual classification: the structural indexes classify as merger the 
galaxies lying in the top right-hand sector, while symbols and 
colors are the same as in Figure[T] The combined criterion shows a 
good agreement with our classification, in fact the contaminations 
affecting the original criteria are almost completely removed. 



Even if the low statistics does not allow any strong conclu- 
sions, we point that the merger fraction among each redshift 
bin is almost constant (Table [2}, so it does not display any 

evident signs of evolution in the local Universe. 

Our r esults are in excellent agreement with iKoss et"ai1 
(2010), that performed a visual analysis on a similar BAT 
subsample and found a merger fraction of 25 per cent, by 
considering all the perturbed galaxies and the pairs with a 
separation below 30 kpc. We have compared the luminosity 
distributions (14-150 keV band) of the interacting and the 
non interacting systems of our BAT sample and, according 
to a KS test (prob ks = 0.2), the luminosity distributions of 
the two subsamples do not display significant differences. 



4.1.3 Statistical corrections 

It is possible to further improve our results, by estimating 
the completeness and the reliability of the automated clas- 
sification and applying a statistical correction to the merger 
fraction. 

- Completeness: it quantifies the amount of missed 
mergers, that is the number of systems that have been la- 
belled as "interacting" by the visual classification, but as 
"non interacting" by the combined criterion. We define this 
coefficient as 



Ccasgm = 



Nn 



(12) 



where JV m> true is the number of mergers in common between 
the automated and the visual classification, while iV m>v iBuai 
represents the number of mergers of the visual classification. 
By extrapolating the completeness from the BAT sample, we 
obtain Ccasgm = 10/12 = 0.81q \ ■ This parameter allows 
to derive the real merger fraction of the sample, in fact it 
tells that the number of mergers that have been correct hF^l 



13 Spurious and wrong merger detections must be excluded from 
the sum. 



© 2012 RAS, MNRAS 000,[THT7] 



Non-parametric classification of a BAT sample 9 



Redshift BAT sample Control sample 





N sys 


N m 


fm 


N syB 


N m 


fm 


0.003 s: z < 0.01 


15 


3 


20 (9-36) 


63 


3 


4.8 (2.2-9.2) 


0.01 ^ z < 0.02 


16 


3 


19 (9-34) 


67 


4 


6.0 (3.1-10.5) 


0.02 ^ z < 0.03 


28 


6 


21 (13-32) 


117 


9 


7.7 (5.2-11.0) 


Total (CASGM) 


59 


12 


20 (15-27) 


247 


16 


6.5 (4.9-8.5) 


Corrected 


59 


12 


20 (15-27) 


247 


10 


4.0 (2.8-5.7) 



Table 2. Detailed comparison of the merger fraction / m of the BAT and of the control sample in each redshift bin, according to the 
classification of the combined criterion. In the "Total CASGM" line are summarized the results of the mere CASGM classification, 
while in the "Corrected" line we indicate the merger fractions after the application of the reliability and the completeness corrections. 
AGN host galaxies are found more frequently in phase of interaction compared to a random selection of galaxies in the same redshift 
interval. This suggests that there is a link between the merging event and the activity of the SMBH at the centre of galaxies. 



detected by the automated classification is about 80 per cent 
of the real number. 

- Reliability: it quantifies the fraction of normal systems 
that have been erroneously classified as mergers by the au- 
tomated procedure. We define it through the probability, P, 
that the procedure gives a false positive (false merger) in 
case of a non-merging system, i.e.: 

r> ^Vm, false /io\ 

PCASGM = -rrz , (IS) 

J 'normal 

where N m ^ee is the number of wrong mergers and -/V norma i 
is the number of non interacting sources (that is the differ- 
ence between the number of systems N sys in the sample and 
the number of real mergers iV m , rca i). By extrapolating this 
value from the BAT sample, we obtain: Pcasgm = 2/47 ~ 
0.041q'q3 , which means that about 4 per cent of the non 
interacting systems is instead classified as merger by the 
combined criterion. 

A good knowledge of these coefficients is extremely use- 
ful for correcting the merger fraction of very large samples, 
that can not be visually inspected. In fact, by applying the 
reliability correction, we obtain the number of "true" merg- 
ers detected by the software, and then, taking into account 
the completeness coefficient, we can estimate the real num- 
ber of interacting systems A?rn,real : 

, r N m — Pcasgm x N aya . . 

A m , re al = g , (14) 

CcASGM — i CASGM 

where Nm is the number of mergers detected by the com- 
bined criterion and N aya is the total number of systems in 
the sample. 

4.2 Results of the control sample 

4-2.1 Merging fraction and statistical corrections 

The procedure described in the previous sections detects 
16 merging systems in the control sample (see Table [2]) 
corresponding to a merger fraction of / m , control = 6.51^6 
per cent. This fraction, however, does not take into ac- 
count the corrections for the reliability and the complete- 
ness previously discussed. Using our estimates of Pcasgm 
and Ccasgm based on the BAT sample, we derive that the 
real number of mergers in the control sample is (see equa- 
tion |T4j) A^m.roai ~ 8. In particular, the expected number of 
true mergers among the 16 detected by the algorithm is ~ 6 
(Pcasgm correction), while two more real mergers are ex- 
pected to be missed by the procedure (Ccasgm correction). 



Given the large fraction of the detected mergers that are ex- 
pected to be spurious (more than 60 per cent), we have visu- 
ally inspected all the 16 systems found by the procedure as 
mergers, to confirm and better constrain the actual number 
of false/true mergers. In good agreement with our expec- 
tations, we find that only 8 systems are true mergers, the 
remaining ones being star-burst or irregular galaxies. This 
number confirms that the procedure works similarly in the 
BAT and in the control sample. 

By applying also the completeness correction (equation 
I12[) . we derive that the total number of real mergers in the 
control sample is 8/0.8~10 which corresponds to a merger 
fraction of: 



m, cor r .control 



10/247 ~ 4.0 



-1.7 , 

-i.2 P er cent. 



(15) 



In addition, the large number of objects in this sample 
allows us to derive an estimate of Pcasgm which is more 
accurate than the one based on the BAT sample: 



Pcasgi 



St|#)/237 



0.034t° :g£ 



(16) 



Our results show that the average merger fraction of 
galaxies at redshift ~ is very low, in accordan c e with the 
studies of [Patton fc Atfieldl (|2008l '). |Patton et all (|2002T l and 
iKoss et all (|201fj| . that claim a merger fraction of ~ 1 — 2 
per cent. The higher value suggested by our work is probably 
related to a selection effect, because our control sample is 
not drawn as a random selection of galaxies in the prefixed 
redshift interval, but it is forced to follow the BAT sample's 
redshift distribution. This confirms the importance of build- 
ing a control sample which reflects, as much as possible, all 
the key properties of the other sample. The merger fraction 
found in the control sample is significantly (3<r) lower than 
that found in the BAT sample. 



4-2.2 The role of the galaxy mass distribution 

The (stellar) mass distribution of galaxies hosting BAT 
AGNs is very likely to be different from that of inactive 
galaxies or SDSS AGNs (Koss et al. 2011), with BAT AGNs 
typically residing in galaxies more massive than average. 

The effects of galaxy mass upon the merger fraction 
measured through the CASGM method are uncertairF^I. 



14 iPatton fc Atfi cld ( 2008) find that the frequency of galaxy pairs 
is larger for low-luminosity (and, presumably, low-mass) than for 
high-luminosity galaxies; but this trend is reversed when they cor- 
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Figure 3. Classification of control sample's systems according 
to the combined criterion: the systems lying in the top right- 
hand sector of the plot are labelled as mergers by the automated 
criterion. For these objects, we performed a visual analysis (red 
circles: galaxies visually classified as interacting; blue asterisk: 
normal galaxies) , while black triangles represents non- merger sys- 
tems according to CASGM. 

However, if the mass dependence is relatively strong, we 
might obtain different merger fractions for the BAT and the 
control sample simply because of their different mass distri- 
butions. Therefore, it is necessary to check this hypothesis. 

As a first step we evaluated galaxy stellar masses: this 
was done by converting the ugriz magnitudes from the 
S PSS into Johnson BVRI magnitudes (using the formulae 
in iBlanton fc Roweisl [20071 ) . calculating the distance mod- 
ulus (DM) from the redshift of each galaxy (we assumed 
Ho = 71 kms" 1 Mpc" 1 ), and finally estimating the stellar 
masses as 

log(M*/M ) = log [Mi (B-R)] + 0.4(7 -DM-Iq), (17) 

where Mi( B — R) is the ma s s to l ight ratio (in solar units) 
provided by iBell fc de Jon3 (|200ll ) for the I band, and as 
a function of the B — R colour of the galaxy; whereas I is 
the galaxy apparent I magnitude, and 7q = 4.52 is the I 
absolute magnitude of the Sun. 

Figure [4] compares the distributions of stellar masses in 
the BAT and in the full control sample: the two distributions 
are quite different, as massive galaxies are much more fre- 
quent in the BAT sample. We note that this is partly caused 
by the contribution of the AGNs within the galaxies of the 
BAT sample; however, the observed difference in luminosi- 
ties is very large (the medians of the two samples differ by 
a factor of ~ 5), and cannot be explained in this way. 

We checked whether this difference in the mass distribu- 
tions could account for the difference in the merger fraction 
by building a mass-matched sample in the same way as we 
built a redshift-matched sample (see Sec. 2.2). In this case, 
we divided the galaxies in 3 mass bins (M*/Mq < 10 9 ' 5 ; 



10 9 ' 5 < M„/M s < 10 10 ' 5 ; M»/M 5s 10 10 ' 5 ), and extracted 
173 systems from the full control sample. 

Within the mass-matched control sample, 11 systems 
are classified as mergers by the CASGM combined crite- 
rion; this corresponds to an uncorrected merger fraction 
/m.MMS = III3 3/I73 ~ 6.4ti'g per cent, and to a corrected 
merger fraction of / m , C orr,MMS = 3.9^2"! P er cent, in very 
good agreement with the values for the redshift-matched 
control sample. 

This result should be taken with caution, since the red- 
shift distribution of the mass-matched control sample is dif- 
ferent from that of the BAT sample. An ideal comparison 
should use a sample that simultaneously matches both the 
mass and redshift distributions of the BAT sample; unfortu- 
nately, our full control sample does not allow to proceed in 
this way, as it includes only a small number (5) high-mass 
(M*/M ^ 10 10 ' 5 ) systems atz< 0.02. 

However, we can look at the simultaneous effect of both 
mass and redshift in two different ways. In the 0.02 ^ z < 
0.03 bin the full control sample includes a reasonable num- 
ber (68) of high-mass systems: therefore, we extracted a 
mass-matched control sample within this redshift bin, where 
the combined criterion finds 15 mergers among the 162 sys- 
tems. This corresponds to an uncorrected merger fraction 
/m,MMS,z>o.02 = 15^3^/162 ~ 9.3±2 g per cent, and to a 
corrected merger fraction of / m ,corr,MMS,z^o.o2 = 6. 3jl|;§ per 
cent; both values are consistent with the results for the same 
redshift bin that we gave in Table 2 (/ mjZ ^o.02 = P er 
cent, and / m ,corr,z^o.02 = 5-Q-t.2 P er cent). 

Instead, when looking at our full redshift range, we eval- 
uate the uncorrected merger fraction in each bin of redshift 
and mass, and average them so as to reproduce the mass 
and redshift distribution of the BAT sample. In this way, 
we get an uncorrected merger fraction / m ,av g = 7.2j^ 9 'y per 
cent, and a corrected merger fractiorPI / m ,avg,corr = 3-7 -^i 5 
per cent. The large errors derive from the highly uncer- 
tain merger fractions of high-mass systems at z < 0.02: 
if instead we make the very reasonable assumption that 
these are equal to what we find for high-mass systems at 
0.02 < z < 0.03 (/ m , z ^o.o2,io g (M)^io.5 = 8.8±3 5 per cent, 
fully compatible both with the scarce high-mass data at 
2 < 0.02, and with the redshift trend of the merger fractions 
in the other mass bins), we obtain an uncorrected merger 
fraction / mi „ g , = 5.9if'g per cent, and a corrected one of 

./nijavg* ,corr — ^"^—2.1 PCr Cent. 

We conclude that simultaneously controlling for the 
mass and redshift distributions cannot reconcile the merger 
fractions of the BAT and the control sample. This fact 
is proved (at the 1.8<r level) for the 0.02 sC z < 0.03 
redshift bin. In the full sample it somewhat depends on 
the assumption that the merger fraction for galaxies with 
M* > 10 10 ' 5 M Q does not change between z = 0.003 and 
z — 0.03: if such assumption is made, the (corrected) merger 
fractions of the two samples differ at the 2.6a" level. 



rect for perspective pairs. The CASGM method is somewhat in 
between the two cases: a galaxy pair which is well-separated on 
the sky will be classified as a merger only if there are morpholog- 
ical anomalies (i.e. if the pair is physical); but the method cannot 
distinguish physical and perspective pairs if the sky separation is 
small. Then, the CASGM -measured merger fraction should have 
only a weak dependence on galaxy mass. 



15 If / m = A r m /A r sys is the uncorrected merger fraction, 
equation [14] implies that / m , C orr = A r m,rcal/A' S ys = (/ m - 
-Pcasgm)/(C'casgm — Fcasgm). 



© 2012 RAS, MNRAS 000,[THT7] 



Non-parametric classification of a BAT sample 11 



- BAT sample 


























V//// 




y////, 



and mass leads to comparable but somewhat less significant 
results. 

Our work is in a greement w i th oth er observational stud- 
ies dSandersl Il988l; iKoss et ahl I201C ) and numerica l sim- 




9.0 9.5 10.0 10.5 11.0 11.5 

Log(M,/M e ) 



Figure 4. Comparison between the mass distributions of the 
BAT sample (top panel) and of the full control sample (bottom 
panel). 



5 SUMMARY AND CONCLUSIONS 

In this work we focused on three main topics: 

(i) Software: we have implemented the new software 
PyCASSo for the automated computation of the structural 
indexes of the CASGM system. Our procedures a re entirely 
based on t he definitions and relations presented in lConselicel 
(2003) and lLotz et al.l (12004 ), but we have implemented the 
possibility to use elliptical apertures, because they provide 
a better fit of the galaxy outline. Moreover, we carried 
on extensive tests on possible image degradations, so our 
software minimizes any data loss and smoothing effect and 
provides a stable and reliable analysis. 

(ii) Method: we propose an improved technique for 
evaluating the merger fraction of a galaxy sample by 
means of the CASGM system. Indeed, we show that the 
original classification is biased towards irregular, edge-on 
and dusty galaxies, which tend to be misclassified as 
mergers. We propose a combined criterion between the 
A, S, G and M20 indexes, which leads to the complete 
blending of the CAS and CM methods and corrects nearly 
80 per cent of the contamination. Then, we define the 
completeness and the reliability coefficients, that allow a 
statistical correction of the merger fraction and further re- 
duce possible residual errors in the automated classification. 

(iii) Application: we have applied the CASGM analysis 
to a sample of local AGN host galaxies and a comparison 
sample, to extract their merger fractions and test whether 
there is an enhanced fraction of mergers among active galax- 
ies. We found that in the BAT sample the merger fraction 
is 20^5 per cent. In the redshift-matched control sample the 
merger fraction is 4.01* 2 P er cent, and the difference is sig- 
nificant at the 3<r level. We obtain similar results for a mass- 
matched control sample. Simultaneously matching redshift 



illations llBarnes fc Hernauistlll99ll ; bi Matteo et al.ll2005l : 
Hopk ins et al. 2006) that suggest that galaxy interactions 
trigger the activity of the SMBH at their centre. The most 
likely scenario is that the strong gravitational perturbations 
drive large quantities of gas towards the centre of the rem- 
nant, originating both an intense starburst phase and an 
enhanced nuclear activity. Mergers may therefore be respon- 
sible not only for large scale (~ 1 3 pc) distortions, but also 
of the inflow of gas down to the typical scale of SMBH ac- 
cretion (~ 10~ 4 pc). Current numerical simulations can not 
investigate entirely such a wide scale range, so observational 
studies have a key role for the comprehension of these phe- 
nomena. However, as we pointed out in Section [T] similar 
stud ies on higher redshift (0.2 < z < 1.2) ga l axy samples 
(i.e. ICisternas et al. I l201ll ; [ Gabor et al. I l2009l : iPierce et all 
2007) do not show any enhancements of the merger frac- 
tion of AGN host galaxies. Selection biases in the active 
galaxies sample and/or in the control sample could partially 
explain these contradicting results. For example, the afore- 
mentioned studies are based on other selection criteria (i.e. 
soft X-ray, 2-10 keV energy band), but, due to the signifi- 
cant fraction of obscured AGNs (see Menci et al. 2008) , they 
may detect a lower number of sources compared to our hard 
X-ray (15-195 keV) selection. 

Therefore, while the resul ts presented here and in previ- 
ous observational studies (e.g. IKoss et al.ll2010f ) suggest that 
in the low redshift (z < 0.03) Universe galaxy interactions 
trigger the activity of the SMBH at their centre, further re- 
searches that focus on an accurate and unbiased selection of 
galaxies - both at intermediate (0.03 ^ z < 0.2) and higher 
(0.2 ^ z < 1.2) redshifts - are mandatory to derive improved 
estimates on the occurrence and role of galaxy interactions 
on SMBH activity. 
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APPENDIX A: DATA PROCESSING 
ALGORITHMS 

The image analysis process is split into three main phases: 
data acquisition, pre-processing and processing. In the first 
two phases we essentially use publically available codes 
(Montage and S Extractor), whereas for the processing 
phase we developed the software PyCASSo. 



Al Montage: data acquisition 

We use the software Montage to automatically assemble 
multiple SDSS frames, in order to obtain full images of the 
desired galaxies. This software needs as input the central co- 
ordinates, the band of observation and the sizes (arcmin) of 
the desired field of view. It automatically queries the SDSS 
database for the frames that compose the image, and, ex- 
ploiting the astrometry and the calibrations of the original 
frames, it proceeds with their alignment and superposition, 
it compares the intensities of the overlapping pixels and it 
corrects possible background offsets and gradients, to pro- 
duce a uniform mosaic. Montage preserves all the informa- 
tion of the original images (such as the photometric intensity 
of the sources and the World Coordinate System), it is able 
to assemble together a large number of frames and has a very 
good success rate (more than 95 per cent in our experience) 
so it is the ideal instrument for our automated workflow. 
Moreover, the images returned by Montage are centred on 
the selected coordinates, so the queried galaxy is always in 
the middle of the frame. 

Montage has two drawbacks: Moire pattern and slight im- 
age degradation. The first is an interference pattern that 
occurs when two grids, with different orientations or mesh 
sizes, are superimposed. This is unavoidable, because Mon- 
tage has to create a new grid of pixels (the final image) and 
overlap portions of the o riginal frames, ro tating them prop- 
erly to match each other . ICons clicc (2003) carried on numer- 
ical simulations to determine the impact of correlated noise 
on the asymmetry measure: it turned out that this effect 
is very small (5A ^ 0.03 on average) because the CASGM 
background correction routine, by analysing a region of pure 
background, takes into account the noise pattern. Moreover, 
in our case, the Moire pattern appears only by giving much 
contrast to the images, so it is generally unimportant. 
Image degradation (see Appendix [B| inevitably occurs be- 
cause, during a rotation or a non-integer translation, each 
new pixel is the average of the original pixels which lay in 
that position, each one weighed on its fraction of area. It is 
almost impossible to assess the amount of smoothing caused 
by image degradation, because it depends on the number of 
frames assembled, and on how they overlap. Our numerical 
tests suggest that this effect can reduce the background vari- 
ance by a factor between ~ 1 and 4; this estimate is con- 
firmed by our measures on BAT galaxy images, that give 
an average reduction factor of 1.5. The plus side is that, 
again, background corrections reduce this error, because the 
smoothing affects both galaxy and background light distri- 
bution. 



A2 SExtractor: pre-processing 

We exploit SExtractor during the pre-processing step, in 
order to detect all the sources inside the image and identify 
those that that need to be masked because they may alter 
the CASGM analysis (i.e. bright stars or cosmic rays). The 
software first examines the light profile of each source, and 
then extracts a catalogue of properties related to their pho- 
tometry. It is also possible to save control images, such as 
the SEGMENTATION map (where all the pixels belonging 
to the same source have the same value, corresponding to 
the source ID number reported in the catalogue). To remove 
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undesired sources, we exploit the CLASS-STAR parameter 
returned by SExtractor: this specifies whether the light 
profile of the source is point-like/stellar (CLASS_STAR~ 
1) or extended/non-stellar (CLASS^STAR ~ 0). Our pre- 
processing uses a simple script that checks CLASS-STAR 
values: if it is greater that 0.1, it flags as "star" the corre- 
sponding line of the catalogue. In the processing phase, all 
the stellar sources will be masked, according to the outline 
provided by the SEGMENTATION image. Of course, the 
automated procedure is efficient, but not always perfect: the 
script warns the user in case of conflicting results and it is 
possible to edit the mask and add custom circular or ellip- 
tical masks. Since we want to evaluate the merger fraction 
we do not manually remove any chance superpositions. The 
"normal" or "merger" classification is uniquely provided by 
the CASGM analysis. 

A3 PyCASSo: CASGM analysis 

PyCASSo is in charge of the core of our analysis, that is 
the computation of the structural indexes for the automated 
classification of galaxies. Our software is entirely written 
in Python, making use of standard extension modules 
(e.g. Numpy, PyFITS, etc.), and it can be run both 
interactively and in batch mode, providing a fast analysis 
for each galax-J^I. 

Here we give a concise description of its workflow, while 
in the following sections we describe in detail the features 
of the software and their implementation. In the develop- 
ment process we paid particular attention to possible image 
degradation effects, so we will point out also some differ- 
ences between our implementation and t hose of other au- 
thors (|Conselicell200l . 1vikram et alj|201(j >. 
PyCASSo loads all the data computed in the previous steps, 
masks the unwanted sources, and subtracts the image back- 
ground, providing a "clean" image. Then, it selects the tar- 
get galaxy and computes, through a recursive process, its 
position angle, petrosian radius and asymmetry indej^l. and 
extracts - according to the definition given in Section [3.11 - 
the aperture that defines the area of the source (Appendix 
\MM- Using the same aperture, it computes the concentra- 
tion (Appendix I A3. 6|) and the dumpiness ( Appendix lA3.4|) . 
In case of pairs of galaxies, the companion can be included 
(partially or entirely) in the aperture: in this case it is neces- 
sary that the two galaxies are close enough[3 and that they 
are somehow connected each other (i.e. by a tidal tail or by a 
luminous halo) , because the cut-off of the aperture is based 
on the light profile of the first galaxy, so it can not extend 

16 For example, PyCASSo takes ~ 100 sec to analyse a 900 X 900 
image, when running on an Intel Celeron CPU at 2.0Ghz, with 
2GB RAM. 

17 For the CAS indexes, the centre of the galaxy is the pixel 
which minimizes the asymmetry value, so the asymmetry must 
be recomputed after each variation in the estimate either of the 
position angle and of the petrosian radius of the galaxy. 

18 It is unlikely to detect pairs of galaxies with separation greater 
than 30 kpc as mergers. In fact, the CAS aperture can not extend 
much beyond the edge of the first galaxy and in general the whole 
CASGM system is sensible only to pairs of galaxies close enough 
to perturb each other. 



much beyond its outline. If there is a clear separation be- 
tween the light distributions of the two sources, the aperture 
fits tightly the first galaxy and the companion is automat- 
ically excluded. Next, PyCASSo corrects the estimates of 
asymmetry and dumpiness for any possible contributions 
coming from the background (Appendix I A3. 5() : this step is 
crucial, in fact the correction can lead also to the halving of 
the original values. At this point the software picks up again 
the "clean" image and extracts the segmentation map of the 
galaxy (see Appendix lA3.7|l : the map follows the galaxy con- 
tour and again, in case of close pairs, it includes both the 
galaxies. On the contrary, all the sources that satisfy the 
brightness constraints of the segmentation map, but are not 
directly linked with the main galaxy, are masked. The seg- 
mentation map is then used by PyCASSo for computing the 
Gini coefficient and the second order momentum of light ac- 
cording to their definitions. The CASGM indexes and all 
the other parameters that are computed by PyCASSo are 
collected in an ascii file. The software saves also a set of 
control images and warns the user if the galaxy size is too 
small to allow a reliable analysis. 

A3. 1 Image preparation 

PyCASSo extracts the positions and properties (semi- 
major axis, position angle, axis ratio) of the galaxy to anal- 
yse from SExtractor catalogues generated in the pre- 
processing phase. Such catalogues (and the associated SEG- 
MENTATION image, see Section |A"2)) are used to mask out 
contaminating sources. The intensity of the background is 
evaluated as the mode (calculated as 3 x median — 2 x 
average, cfr. iKendall fc Stuartlll977h of the pixels surviv- 
ing a recursive sigma-clipping algorithm. Such intensity is 
subtracted from the masked image, and the result is used in 
all the following steps of the CASGM analysis. 

A3. 2 Properties of the main galaxy 

The CAS method is usually applied on a circular area with 
radius 1.5rp (rp is the Petrosian radius, see Section l3~Tj) . 
However, PyCASSo can also use elliptical areas, because 
they are often more suited for stretched galaxies and close 
pairs (i.e. the kind of objects we are most interested in). 
When using elliptical areas, we consider rp as the ellipse's 
semi-major axis, and we use the axis ratio in the SExtrac- 
tor catalogues. Since the position angle in the SExtrac- 
tor catalogue is often inaccurate, we recompute it (by max- 
imizing the flux inside the elliptical area). 

A3. 3 Asymmetry 

The asymmetry index (A) needs to be calculated first, be- 
cause it sets the exact centre of the galaxy to be used in 
the following steps. For each possible centre (i.e. for each 
pixel in a box of side rp/8 around the SExtractor cen- 
tre), we obtain an "aperture image" by masking the pixels 
outside the 1.5rp circle/ellipse. Each aperture image is used 
to estimate the value of A through equation [3] (in this phase 
we neglect the background term, since it is almost indepen- 
dent of the the centre position): the new centre is set to the 
pixel that minimizes A. Each time that this minimization 
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procedure shifts the centre, we recompute both rp and the 
position angle, and repeat the above procedure using the 
new parameters. After a stable centre is found, the value of 
A from the above procedure needs to be corrected for the 
background term in equation^ Details about this correction 
are given in Section [A3, 51 



In contrast with Conselicel (|200&t ) and IVikram et al.l 
we do not attempt to estimate the centre position 
with sub-pixel accuracy. This is because translations by a 
fraction of a pixel (and rotations by angles that are not 
multiples of 90°) require image interpolations that tend to 
smooth (and degrade) the original image. Our decision is 
supported by the comparison of the errors introduced by 
this smoothing with those due to the limited precision in 
the centre determination described in Appendix iBl 



A3. 4 dumpiness 

For the dumpiness (S) computation, we create a copy of 
the aperture im age, and we smo oth it with a top-hat filter 
of width 0.25rp (|Conselicd 120031 ). so that the blurring scale 
is a fixed fraction of the galaxy size. The smoothed image 
Is is subtracted from the original aperture image I and, 
according to equation [4] the intensities of all the positive 
pixelfEl of the residual image are summed, and this sum is 
multiplied by 10 and normalized by the cumulative intensity 
of the same pixels in the aperture image (see Figure [AT]) . In 
symbols (see also equation Q , 



S = 10 



j: t ji(j,j)-is(i,j)} 

V(i, j) such that > Is(i,j) 



(Al) 



As in the case of the asymmetry A, the dumpiness 
S needs to be corrected for the background contribution, 
whose estimation is described in the next subsection. 



(a) 




(c) 



Figure Al. Examples of images produced by PyCASSo while 
estimating the CAS indexes: (a) aperture image enclosing the el- 
liptical area of the galaxy; (b) residual after the subtraction of 
the 180° rotated image (used for the computation of the asym- 
metry); (c) smoothing of the aperture image with a top hat filter 
(for the dumpiness computation); (d) residual image after the 
subtraction of the smoothed image (for the dumpiness compu- 
tation). The dark- blue area around the ellipse of the galaxy is 
masked because it is outside of the 1.5rp limit. In images (c) and 
(d) there is also a central mask excluding the bulge/nucleus con- 
tribution. White areas show the masks applied by the software for 
removing foreground stars or other contaminating sources. Color 
normalizations are different between the four images. 



A3. 5 Background corrections 

The simple subtraction of the pixel mode applied in the im- 
age preparation phase guarantees that the image-averaged 
background is close to 0, but does not take into account 
brightness gradients and the granularity of the image; since 
they can affect the value s of A and S, a further correc- 
tion must be applied fsee I Conselicel [2003 ). Therefore, Py- 
CASSo computes the spurious asymmetry and dumpiness 
in a square box of sky near the galaxy, and corrects the 
previous estimates of these indexes. We adopt the following 
criteria to spot the best sky area: (i) the box must be repre- 
sentative of the background, so it must be as free as possible 
from sources (stars, galaxies etc.). We search only for boxes 
where 80 per cent of the pixels, at least, belong to back- 
ground (that is, the pixel intensity B(x,y) is between — 3<r 
and a, where a is the background standard deviation). Boxes 
that do not satisfy this requirement are discarded, (ii) The 
box must be as close as possible to the galaxy, because it has 

19 The central 0.2 5rp circular par t of the galaxy is excluded from 
this computation l lCon sclicc 2003]) because it might be con tami- 
nated (e.g. by an AGN). Furthermore, the Consclicc (2003) pro- 
cedure establishes that all the pixels where the subtraction of the 
smoothed image gives a negative result should be forced to 0. 



to map the local properties of the sky. Therefore, we start 
from the edge of the aperture image, and search for all the 
boxes that satisfy condition (i) and do not overlap with the 
galaxy. If PyCASSo does not find at least five valid boxes 
along the loop, it restarts the search on a wider ring, (iii) 
The box size should be comparable with the galaxy size. We 
initially search for boxes with the same area as the aperture 
image. If PyCASSo finds less than five boxes that satisfy 
criteria (i) and (ii), it reduces the box size by 20 per cent 
and repeats the search from the beginning. When a search 
for background boxes satisfies all these requirements, Py- 
CASSo computes the asymmetry of each box, and chooses 
the box with the lowest A. On the same box it computes 
the background dumpiness. Both these values are normal- 
ized by the total intensity of the galaxy and linearly rescaled 

with respect to the galaxy area. 

In contrast with other authors (|Vikram et al.ll2~010l ) , we 
give more importance to the size of the box than to its prox- 
imity to the galaxy. This choice is motivated by a test (see 
Table |AT|) . showing that the asymmetry correction depends 
on the size of the box, even after rescaling is taken into ac- 
count. Therefore, it is important to select a box of size as 
similar as possible to the galaxy, keeping the rescaling factor 
close to unity. Small-area asymmetries, rescaled to a much 
larger size, generally underestimate the asymmetry correc- 
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Box side (px) 


^background 


^background 


30 


0.097 


0.0012 


60 


0.163 


0.0152 


100 


0.182 


0.0008 


150 


0.186 


0.0186 


200 


0.195 


0.0066 


250 


0.212 


0.0012 


300 


0.218 


0.0005 


350 


0.223 


0.0022 



Per cent 



Table Al. Example of the relation between the box size and the 
background corrections. The first column shows the size in pixels 
of the box side, the second and the third give the asymmetry 
and dumpiness corrections, respectively. For this table we used 
the NGC 4123 data from the lFrei et"alTl|l996l 'l catalogue, but we 
obtained similar results with all the ~ 10 tested sources. Both 
the asymmetry and the dumpiness corrections have been already 
rescaled to the galaxy size, which corresponds to a square box of 
side ~ 380 pixels. 

tion. On the contrary, the dumpiness correction does not 
show any trend. 



A3. 6 Concentration 

The concentration index is the ratio between the rgo and T2o 
radii, that contain respectively the 80 per cent and the 20 
per cent of the total flux of the galaxy (operatively defined 
as the count sum of all the valid pixels inside the aperture 
image). These radii are computed starting from the centre of 
the galaxy and considering larger and larger apertures, until 
the interior flux reaches respectively the 20 per cent and the 
80 per cent of the total; then we compute C as in equation 
[2] In the innermost part of the galaxy, where the brightness 
profile varies steeply (especially for galaxies with small an- 
gular scale), it is important to compute the radii with an 
high precision level (fractions of pixel). PyCASSo achieves 
this result by oversampling the image, i.e. by choosing a 
refinement factor ref (between 1 and 10), converting each 
pixel in the aperture image into a square of side (ref x ref) 
pixels, each one with intensity equal to ref~ 2 times the orig- 
inal value (to preserve the total flux), and then computing 
r2o and rso on this enlarged image. 

Our concentration values on the iFrei et al.l (|l996l ) 
galaxy catalogue are c onsistent within la with those pro - 
vided by other authors <|Conselice!l2003l , IVikram et alj20ld) : 
however, we note that our values of C tend to be lower (by 
about 6-9 per cent) than other estimates. 

The concentration index is computed directly from the 
light profile of the galaxy, so it depends m ainly on th e back - 
ground subtraction. From our tests on the lFrei et all (|l996l ) 
galaxies, we see that an inaccuracy of only one per cent in 
the background value can lead to an error of up to 10 per 
cent in the concentration index (see the example in Table 
lA2l . 

Other authors use different methods for the background 
subtr action, like exploiti ng the SExtractor background 
map JVikram et al. 20101) . or a fit with a polynomi al func- 
tion (|Conselicdl2003l . iHernandez-Toledo et al.ll2005l ). How- 
ever, these techniques might produce a local overestimat J^l 







of the mode 




C 


NGC 


4030 


100 


58 


3.44 






99 


61 


3.52 






98 


65 


3.61 






97 


69 


3.69 


NGC 


3198 


100 


94 


2.82 






99 


112 


3.01 



SExtractor splits the image and evaluates the background 



Table A2. Examples of the variation of the petrosian radius rp 
and the concentration index due the uncertaity in the background 
intensity. A few percent variation in the background estimate may 
induce variations in t he C index of up to 10 per cen t. Both ga l axies 
are taken from the IFrei et aT. I Jl996l) catalogue. IConselice! |2003| 
estimated C = 3.67 (NGC 4030) and C = 3.01 (NGC 3198). 



of the background in the area covered by the galaxy, and in 
particular at its centre, thus increasing the value of rp. As 
a test, we analyzed images after subtracting the SExtrac- 
tor background image (rather than the one calculated by 
PyCASSo), and found that C increases by ~ 5 per cent (on 
average), mak i ng ou r res ults fully compar able with those by 
IVikram et all (|2010H and IConselice! (|2003l 'l. 

Because of these considerations about background sub- 
traction, we decided to keep our procedure, which is less 
subject to subtle artifacts. We remind that the concentra- 
tion index is not used for any merging criteria: these small 
inconsistencies with other authors do not alter the science 
results of this paper. 



A3. 7 Segmentation image, Gini coefficient and second 
order momentum of light 

The Gini coefficient and the second order momentum of light 
are not related to any of the CAS indexes. They rely on an- 
other definition of the centre and of the area of the galaxy , 
defined through the segmentation map (|Lotz et al.l 120041 ). 
First of all, by using elliptical apertures, we compute the 
mean intensity 7p at the Petrosian semi- major axis dp, and 
we convolve the image with a Gaussian filter of width ap/5. 
This step increases the S/N ratio of the outer region of the 
galaxy, facilitating the identification of low surface bright- 
ness features. The segmentaton map is extracted from the 
original image, using only those pixels that, in the blurred 
image, satisfy the relation Jp ^ / ^ / a( jj + 10cr a dj (where I is 
the pixel intensity, while 7 a dj and a a( jj are, respectively, the 
median and the standard deviation of the adjacent pixels), 
and that are topologically connected with the main body of 
the galaxy. The continuity requirement is quite weak, allow- 
ing the segmentation map to assume a very irregular shape 
and to follow the galaxy outline (see Figure |A"2)) . 



in each sub-image through of a sigma clipping. This technique 
gives good results for a non-uniform background. However, if a 
galaxy is quite extended, it leads to a local overestimate of the 
background (because some of the boxes are almost entirely filled 
by the object). Therefore, the SExtractor background describes 
very well the empty regions of the frame and a possible brightness 
gradient, but tends to follow the light distribution of the sources. 
The same effect occurs with polynomial fits, because they tend to 
follow the intensity peaks produced by the sources in the image. 
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o 







(a) 



(b) 



Figure A2. Example of segmentation maps: the white region was 
removed because it does not satisfy the brightness lower limit, or 
contains sources not connected with the central galaxy. The map 
can assume an irregular shape that fits the galaxy outline. In the 
case of galaxy pairs (b), if the brightness distribution does not 
drop under I p between one source and the other, they both are 
included in the segmentation map. 



From the segmentation map we derive the Gini coeffi- 
cient by sorting pixels by decreasing intensity, and calculat- 
ing G as in equation [6] 

The computation of the momentum M20 is more com- 
plicated: it requires to select a new centre of the galaxy by 
minimizing the value of the total second order momentum 
Aftot- We start this search from the CAS centre, and follow 
the same kind of procedure described in Section |X3j3] When 
we find a stable centre, we sort the pixels by decreasing 
intensity and we compute M20 using equation [8] 

PyCASSo saves the CASGM indexes and all the other 
computed quantities (radii, position angles, background 
properties, etc.) in two ascii files. It saves also a series of 
control images (e.g. the figures in this Appendix) intended 
to help the user to verify the correctness of the analysis. 

If the galaxy is too small, the resolution might be insuf- 
ficient for the CASG M analysis. Resol ution and S/N limits 
have been studied by l|Lotz et al.| [2004l): they found that G, 
M20 and C are reliable within 10 per cent for images with an 
average S/N ratio (per pixel) (S/N) ^ 2; while A and S de- 
crease sistematically with increasing S/N (but variations are 
AA < 0.1 and AS < 0.2 even for (S/N) 5). Low resolu- 
tion has a stronger effect, because it flattens the brightness 
profile of the galaxy, increasing both the Petrosian radius 
and the angular size of the segmentation map. Usually, G, 
A and S are stable for spatial resolutions of 1 kpc or bet- 
ter, while C and M20 show a deviation greater than 15 per 
cent for spatial resolutions worse than 0.5 kpc, because the 
nuclei are no more resolved. Beside these limits, we intro- 
duce a size requirement: the galaxy must have a Petrosian 
semi-major axis ap > 10 pixels, otherwise the analysis may 
not be reliable, especially for what concerns the dumpiness, 
the Gini coefficient and the second order moment of light, 
because they rely on a further smoothing of the image. 



altering operations performed within the CASGM analysis 
are translations and rotations. 

Translations involving a shift by an integer number of 
pixels do not degrade image quality, because they simply 
move intensities from one pixel coordinate to another pixel 
coordinate. Instead, shifts by a fractional number of pix- 
els require an interpolation, i.e. a weighted average on the 
values of neighbouring pixels. This has a smoothing effect 
whose relevance depends on the number of pixels involved 
(2 for a translation along one axis; 4, for a translation along 
both axis) and to the pixel weights (e.g., a shift by 0.1 pixel 
introduces less smoothing than a shift by 0.5 pixel). 

Most of the rotations suffer from the same problem, 
since the original and final grids are not superimposable, 
and an interpolation/weighted average is applied. Rotations 
by angles multiple of 90° exactly centred on a pixel are ex- 
ceptions, since they can be obtained without altering the 
pixel values (by using image reflections and transpositions): 
this is particularly important for the 180° rotation required 
for the computation of A, that can be achieved without in- 
troducing any image degradation. 

Usually, the CASGM index most affected by unwanted 
smoothings is S, but in some cases also rp may be overesti- 
mated (because the averaging makes the light profile of the 
galaxy flatter), influencing also all the other parameters of 
the CAS system. 

IConselicd J2005t ) and lVikram et all l|20ld ) are not very 
restrictive in this respect: for example, they commonly use 
fractions of pixels. While this approach probably suffers from 
image degradation, it also presents some advantages: for ex- 
ample, it allows more precise (to the level of ~ 0.1 pixel 
rather than ~ 0.5 pixel) determinations of galaxy centres, 
that can lead to better values of A. 

For this reason, we carried out tests comparing the loss 
of precision due to uncertainties in the centre positions, and 
the one due to the smoothing associated with translations 
and rotations. The former was evaluated by measuring the 
asymmetry also in the eight pixels around the centre, and 
computing the difference SAi between the asymmetry cal- 
culated in the real centre (we remind that the centre was 
chosen by minimizing A) and the lowest asymmetry of the 
neighbouring pixels. The latter was evaluated by looking at 
the difference AA2 in the asymmetry of the same galaxy 
before and after two consecutive translations (in opposite 
directions, so that the image should return to its original 
position) by 0.5 pixels along each axis. We com pared the two 
errors on the galaxies of the iFrei et al ] (|l996l ) catalogue: on 
average, AAJA ~ 0.056, while AA 2 /A ~ 0.127, i.e. more 
then doubkO 

In the light of the results of the above test, we decided 
that PyCASSo should minimize image-degradation effects 
by applying only integer translations, and computing 180° 
rotations through reflections. 



APPENDIX B: IMAGE DEGRADATION 

Quantitative analysis can be distorted even by small image 
degradations. Therefore, it is important to use procedures 
that minimize image alterations. The most common image- 



21 Had we performed a single "forward" translation, AA2 would 
be reduce by a factor ~ 1.5. 
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